POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing

161











 

 

 

 

 







 

  

 



FIGURE 6.5

Illustration of training wj

i via Expectation-Maximization. We set a free constraint for the

weights obeying one specific distribution, i.e., which is lower than the minimum mean value

or higher than the maximum mean value. For the ones in the middle area (distribution not

transparent), we apply EM(·) to constrain it to converge to a specific distribution.

6.3.4

Optimization for POEM

In our POEM, what needs to be learned and updated are unbinarized weights wi, scale

factor αi and other parameters pi. These three kinds of filters are jointly learned. In each

Bi-FC layer, POEM sequentially updates unbinarized weights wi and scale factor αi. For

other layers, we directly update the parameters pi through backpropagation.

Updating wi via Expectation-Maximization: Given a conventional binarization frame-

work, it learns weights wi based on Eq. 6.44. δwi corresponding to wi is defined as

δwi = ∂LS

wi

+ λ∂LR

wi

(6.45)

wiwiηδwi,

(6.46)

where LS and LR are loss functions, and η is the learning rate. ∂LS

wi can be computed by

backpropagation, and, furthermore, we have

∂LR

wi

= (wiαibwi)αi.

(6.47)

However, this backpropagation process without the necessary constraint will result in

a Gaussian distribution of wi, which degrades the robustness of Bi-FCs as revealed in Eq.

6.80. Our POEM takes another learning objective as

arg min

wi

bwi bwi+γ.

(6.48)

To learn Bi-FCs capable of overcoming this obstacle, we introduce the EM algorithm in

the update of wi. First, we assume that the ideal distribution of wi should be bimodal.

Assumption 6.3.1. For every unbinarized weight of the i-th 1-bit layer, i.e.,wj

i wi, it

can be constrained to follow a Gaussian Mixture Model (GMM).